-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the legacy mode check: only take effect when reading date/timestamp column #10074
Conversation
…stamp column Signed-off-by: Chong Gao <[email protected]>
build |
build |
Failed cases decreased from 1071 to 123 on DB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM.
Question: I'm a little confused why throwing an exception makes these tests pass, could you list some of the tests that this pr fixes?
Below code is to make less chance to throw an exception.
|
/** | ||
* Class for helper functions for Date and Timestamp | ||
*/ | ||
object DateTypeUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are checking for both date and time, let's call it DateTimeUtils
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, since we already have DateUtils.scala
, I would recommend putting this DateTimeUtils
class code into it, and renaming DateUtils.scala
into dateTimeUtils.scala
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* @param predicate predicate for a date type. | ||
* @return true if date type or its children have a true predicate. | ||
*/ | ||
def hasType(t: DataType, predicate: DataType => Boolean): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a duplication of org.apache.spark.sql.rapids.execution.TrampolineUtil#dataTypeExistsRecursively
. Can we use that instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -0,0 +1,53 @@ | |||
/* | |||
* Copyright (c) 2023, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright (c) 2023, NVIDIA CORPORATION. | |
* Copyright (c) 2024, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
build |
b83435f
to
251d18c
Compare
build |
@@ -774,9 +774,11 @@ private case class GpuParquetFileFilterHandler( | |||
val clipped = GpuParquetUtils.clipBlocksToSchema(clippedSchema, blocks, isCaseSensitive) | |||
(clipped, clippedSchema) | |||
} | |||
|
|||
val hasDateTimeInReadSchema = DataTypeUtils.hasDateOrTimestampType(readDataSchema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: There is an isOrContainsDateOrTimestamp
in GpuOverrides
. Can we use it?
spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Lines 712 to 713 in c92bbc0
def isOrContainsDateOrTimestamp(dataType: DataType): Boolean = | |
TrampolineUtil.dataTypeExistsRecursively(dataType, dt => dt == TimestampType || dt == DateType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can use it. Since it's a nit, let's merge this PR first.
contributes to #9792
DB use legacy mode as default which is different from regular Sparks
Current Spark-Rapids does not supports legacy mode + non-UTC time zone.
solution
Update the check when legacy mode + non-UTC time zone:
In this way, we can fix some of the cases, of course not all of them.
With this PR, failed cases decreased from 1071 to 123 on DB.
I'll post another PR to fix remaining the failed cases.
Signed-off-by: Chong Gao [email protected]